Skip to content

Conversation

@javanna
Copy link
Member

@javanna javanna commented Apr 2, 2025

There seems to be some overhead observed in some of our benchmarks around creating an IndexSearcher. In many places, like can match, we acquire a searcher but all we need is an index reader. This small change tries to verify some assumptions by removing the slight slicing overhead caused by looping over the segments when we create a searcher without providing an executor to its constructor. We may iterate on this approach alter, but first I'd like to see the impact of this on our nightly benchmarks.

) {
super(reader);
// pass in an executor so we can shortcut and avoid looping through all segments
super(reader, source.equals(CAN_MATCH_SEARCH_SOURCE) ? Runnable::run : null);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's many more scenarios where all we need is an index reader. This is quite hard to untangle due to the many acquireSearcher callers and different variants of the method, wrapping going on as well. I am tempted to add an acquireIndexReader method to Engine, but this requires significant cleanup to become effective and better than what we currently have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if there is significant overhead on creating a searcher, perhaps that's something to fix in Lucene rather than working around in ES. What triggered this change is suspicions around overhead caused by creating slices eagerly, mostly stuff that happens in the IndexSearcher constructor. We should figure out how to minimize that in Lucene?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and apache/lucene#14343 should help quite a bit on this front. I wonder though, maybe the problem isn't so much with the slice cost but with the fact that we don't really reuse searcher instances much. We don't really need a new searcher (for e.g. can_match) across queries so long as we don't see new data do we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a possible improvement, but what I am trying to do is figure out the cause of a regression. We have not changed how many times we create a searcher, but we did add overhead to the creation of a searcher.

@javanna javanna changed the title WIP: attempt to make acquireSearch lighter WIP: attempt to make acquireSearcher lighter for can match Apr 3, 2025
@javanna javanna marked this pull request as ready for review April 3, 2025 13:58
@javanna javanna changed the title WIP: attempt to make acquireSearcher lighter for can match Attempt to make acquireSearcher lighter for can match Apr 3, 2025
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations >non-issue labels Apr 3, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Apr 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

}

@Override
protected void searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, Weight weight, Collector collector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - why did you add the assertion in this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I want to make sure that when we shortcut the slicing, searchLeaf is never called. That's the assumption based on which we can shortcut the slices to an empty array, because a search will never be executed on top of this searcher instance.

Copy link
Contributor

@benchaplin benchaplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you still want to merge this!

@javanna javanna closed this Jul 10, 2025
@javanna javanna deleted the fix/lighter_searcher branch July 10, 2025 09:39
@javanna javanna removed the v9.2.0 label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants